System and method for probabilistic matching of multiple event logs to single real-world ad serve event

ABSTRACT

A system and method for accurately matching corresponding DSP event data and Ad-Server event data with associated with a single real-world ad serve event by (a) pairing DSP event data and Ad-Server event data into data pairs, (b) comparing various field data in associated source fields from each of the DSP event data and Ad-Server event data to determine if the field data is a match or unmatch, and (c) based on the likelihood that a match of field data in a particular source field indicates an overall event match, which is determined using a Bayesian analysis, determining the probability that the DSP event data and Ad-Server event data in the data pair truly corresponding to the same single real-world ad serve event.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional PatentApplication No. 62/786,533, filed on Dec. 30, 2018, and entitled“Probabilistic Matching Bayesian Analysis.” Such application isincorporated by reference herein in its entirety.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

Not applicable.

BACKGROUND

In programmatic digital advertising, ad displays to digital users (eachsuch display of an ad to a digital user is known an “impression”) areautomatically auctioned off when a digital user views an eligibledisplay space on a browser or other content-viewing application. In afew milliseconds, a demand side platform (“DSP”) processes bids for theimpression on behalf of advertisers, and an Ad Server then delivers thewinning advertiser's advertisement to the user's device. Both the DSPand the Ad Server provide event logs of all impressions to advertisers.For each such event, the two event logs contain different but relateddata, including multiple fields about the geographic location of theuser, the time and date of the ad serve, characteristics of the userdevice's hardware and software, cost information, identifiers connectedto units of the advertiser's larger strategy, and differing randomizedidentifiers for the user and/or the user's device.

Generally speaking, while this current technology allows for thegeneration of the two related event logs (one for the DSP and one forthe Ad Server), the technology does not allow for connecting both eventlogs to a single real-world impression. Even if some association of thetwo event logs were possible, current technology does not provide anymeans for corroborating or verifying that the association between thetwo event logs is accurate. The ability to connect both a DSP log eventand an Ad Server log event to a single real-world impression (andaccurately corroborate the relationship between the DSP and Ad Serverlog events pair) allows for the connection of the two log events'associated data with the single real-world ad serve event. This largerset of associated data values in turn enables associating ad serveevents over time with individual users or among cohorts of meaningfullysimilar individual users without violating any user's privacy.

BRIEF SUMMARY OF THE INVENTION

The present invention is generally directed to a system and method formatching corresponding DSP log events and Ad Server log eventsassociated with a single real-world impression. The present inventionallows for this matching of DSP log events and Ad Server log events intopairs corresponding to the same real-world impression by creating andquantifying two novel factors (independent geographic closeness factorand the sole rightful heir factor) from the event log data and applyingprobability and combinatoric game theoretical analysis to those factors.A pair that does respond to the same real-world impression may bereferred to herein as a “Match” (likewise, a pair that does not respondto the same real-world impression is considered an “Unmatch”). Byconnecting pairs of DSP and Ad Server log events to a single real-worldimpression (i.e. determining for each pair of a number of candidatepairs that the specific pair is a Match), and thus the two log events'associated data, the invention creates a larger set of data valuesassociated with a single impression (those from both the DSP event logand the ad server event log). This larger set of associated data valuesin turn enables associating impressions over time with individual usersor among cohorts of meaningfully similar individual users withoutviolating any user's privacy.

In one embodiment, the invention uses a mix of deterministic andprobabilistic record matching, starting with events recorded over ashort time period (for instance, 24 hours) in both the Ad Server and DSPlogs. The invention uses an algorithm that, first, reduces the searchspace by segregating all events from the Ad Server's log during theselected time period and all events from the DSP's log during the sametime period into “unit groups” corresponding to individual advertisersand to discrete units within each advertiser's larger strategy usingcertain identifying values in the log data. It then defines atime-difference window W_(T) within which most Match pairs are expectedto fall. It then compares every event from the Ad Server's log in agiven unit group with every event from the DSP's log in the same group,first filtering out all pairs that do not fall within thetime-difference window W_(T) and then comparing the other field valuesfor all remaining pairs. This creates a series of DSP-Ad Servercandidate pairs, and each candidate pair can be classified as either aMatch (the pair of log events do correspond to the same real-world eventof a user being served an ad, or “impression”) or an Unmatch (the pairof logs does not correspond to the same real-world impression). Basedoff a comparison of the data values of the DSP log and Ad Sever log of aparticular candidate pair, the invention can calculate the probabilitythat the DSP-Ad Server pair is a Match, by first calculating theprobability that each given pairwise field value would appear if thepair were an Unmatch. These and other objects, features, and advantagesof the present invention will become better understood from aconsideration of the following detailed description of the preferredembodiments and appended claims in conjunction with the drawings asdescribed following:

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram representing DSP Event and Ad-Server event pairs,showing example source fields (state, city, operating system, etc.) foreach event of the event pair.

FIG. 2 is a diagram representing the comparison of the field data ineach of the example source fields for a paired DSP Event and Ad-ServerEvent, indicating whether the field data in each source field is a matchand assigning a two-value Pair Attribute based on whether the field datais matches.

FIG. 3 is a table showing a number of event data sets and associatedsource fields, which are assigned two-value Pair Attributes based on thefield data match or field data unmatch of the data in the associatedsource field and wherein the two-value Pair Attributes are organizedinto a data matrix.

FIG. 4 is a diagram showing the system of the present inventionincluding the consumer, the DSP and its event log data, the Ad Serverand its event log data, and the combination of DSP event data and AdServer event data into a DSP-Ad Server Pair.

FIG. 5 is a modification to FIG. 4 showing the inclusion of a CommonServer.

DETAILED DESCRIPTION OF THE INVENTION

Generally speaking, the present invention in certain implementations isdirected to a system and method for matching corresponding DSP logevents 4 and Ad Server log events 6 associated with a single real-worldimpression, as shown in the figures. The invention utilizes a series ofsteps to create and quantify two novel factors (independent geographiccloseness factor and the sole rightful heir factor) from the event logdata 4, 6 and applies a probability analysis to those factors todetermine whether a particular DSP log-Ad server pair 2 a is a Match(meaning that both the DSP log event 4 and Ad Server log event 6 of theparticular pair 2 a do, in fact, correspond to the same real-worldimpression) or an Unmatch (meaning that the pair 2 a does not correspondto the same real-world impression). Generally speaking, the invention incertain implementations includes the following broad steps: (a) logevents from both data sources are segregated into smaller unit groupscorresponding to individual advertisers and to discrete, identifiableunits within each advertiser's overall strategy, (b) a time-differencewindow W_(T) is defined, within which most Match pairs are expected tofall, (c) DSP-Ad Sever candidate Pairs 2 (which may be referred toherein simply as “Pairs”) that fall within the time-difference windowW_(T) are created for each group, (d) data fields 7 of the Pairs 2 arecompared to create a row of two-valued Pair Attributes for each Pair 2,(e) for each row of two-valued Pair Attributes, the probability that thetwo events correspond to the same impression is determined, and (f) thepairwise match probabilities produced in step (e) are compared for everypotential matching pair and the sole-rightful-heir factor sortscandidate pairs into Match and Unmatch. One or more of these steps maybe modified, eliminated, or substituted depending on the user's desireduse, and it is understood that one or more of these steps may have aseries of sub-steps that achieve the goal of the particular step, asdescribed more fully below. In any event, this general method isutilized by the invention in certain implementations to determinewhether a Pair is a Match or an Unmatch.

As noted above, the preferred first step in determining whether aparticular Pair 2 a is a Match or an Unmatch (and thus whether the Pair2 a does or does not correspond to the same real-world impression) isreducing the search space by segregating events into unit groups andthen applying a filter that keeps only those pairs that fall within thetime-difference window W_(T) for further analysis. These two stepsenable creating the candidate Pairs 2 to be analyzed. In this regard,the invention uses a mix of deterministic and probabilistic recordmatching, starting with events recorded over a short time period(preferably twenty-four hours) in both the Ad Server and DSP logs. Thisfiltering function reduces the field of pairs into a more manageablefield of candidate pairs by segregating all events from the Ad Server'slog during a selected time period (for example, twenty-four hours) andall events from the DSP's log during the same time period into “unitgroups” corresponding to individual advertisers and to discrete unitswithin each advertiser's larger strategy using certain identifyingvalues in the log data. A time-difference window W_(T) is defined, thetime-difference window identifying a window of time in which most Matchpairs are expected to fall. A comparison of every event from the AdServer's log in a given unit group with every event from the DSP's login the same unit group is performed and all pairs that do not fallwithin the time-difference window W_(T) are filtered out (as they aremost unlikely to be Match pairs). This creates a series of candidateParis, and the field values of the other source fields are compared forall of the candidate Pairs. At this stage, the Pairs 2 are either aMatch or an Unmatch, but such classification is not known until theremaining portion of the implementation is utilized to make thatdetermination. A diagram showing examples of these Pairs 2 is shown inFIG. 1.

As noted previously, the DSP event logs 4 and Ad-Server event logs 6 tobe paired 4 and compared for matching are generated by a demand serviceprovider (DSP) 7 and Ad Server 9, respectively, as shown in FIG. 4.These event logs 4, 6 contain data associated with ad-serve events, orthe digital display of advertisements to users 3 on user devices 5(otherwise known as “impressions”). In this regard, the system andmethod of the present invention can be utilized within existing ad-servetechnological environments (such as those that use real time biddingtechnology such as DSPs and SSPs over an ad network) to improve thetracking and identification of digital consumer activity withoutviolating the privacy of any such consumers. To create the event logdata sets 4, 6 for which the present invention is utilized, digitalusers 3 access the internet from user devices 5 (which may, for example,incorporate a browser or other content-viewing application), and when auser 3 views an eligible display space on the user device 5, the DSP 7processes bids for the ad display on behalf of advertisers and the AdServer 9 then delivers the ad to the user's device 5. Each of thesegenerates log data 4, 6 for the event, and, of course, this happens formultiple ad events associated with multiple different users 3. Theseevent logs 4, 6 may be stored in a database and may, in one embodiment,be stored or transferred to a common server 11 (as shown, for example,in FIG. 5), where the record matching of the present invention isutilized to compare the event logs 4, 6 to determine which of the pairs2 are truly Matches.

Each event 4, 6 from each source set has associated with it a series ofdata values that are associated with specific data fields 8. Forexample, both data sets report geographic location (state, city, zipcode, etc.), time, the website where the ad was delivered, and a numberof other data values for each impression event 4, 6. The two data sets4, 6 contain many differing such fields, but the preferredimplementation focuses on comparing those that appear to be “like forlike,” which may include, for example, the following: (a) the timestampof the impression event, (b) the state in which the user was physicallylocated for the impression event, (c) the metro area in which the userwas physically located for the impression event, (d) the city in whichthe user was physically located for the impression event, (e) the5-digit zip code in which the user was physically located for theimpression event, (f) the operating system being used by the user, (g)the browser being used by the user, and (g) the site on which the ad wasdelivered, as shown, for example, in FIGS. 1-2. Each of these fields maybe referred to as a “source field,” and comparing the source fields 8(as shown for example, in FIG. 2) for each Pair 2 creates a row of “pairfields” corresponding to each such Pair 2. Most of the pair fields willbe two-valued and will include (i) a simple Boolean variable indicatingwhether the individual source fields are direct matches (as describedmore fully below and as shown, for example, in the diagram of FIG. 2 andthe table in FIG. 3), and (ii) the value on which the source fieldsmatch, if they do indeed match.

For all pairs in a given unit group that fall within the time-differencewindow W_(T), the source fields 8 (for example, those listed above, orother similar fields) of each of the two individual events in each Pair2 are compared to create Pair Attributes for each such Pair 2. The PairAttributes may include both a Boolean value indicating whether thevalues in each of the corresponding source fields 8 is the same ordifferent, as shown for example in FIG. 2. One value (such as a “1”, forexample) for the Boolean value of the Pair Attribute indicates that thevalues in the corresponding source fields 8 of a Pair 2 match, while asecond value (such as a “0” for example) would indicate that the valuesin the corresponding source fields 8 do not match. In the case of asource field match, the second part of the Pair Attribute indicates thevalue on which the two source fields match. In the case of the sourcefield not matching, the second part of the Pair Attribute is simply“null.”

A data matrix is thus formed with one axis (such as the rows)corresponding to a single candidate Pair 2 and the other axis (forexample the columns) providing the Pair Attributes, which as notedabove, may include (a) Boolean values representing whether the values inthe given source fields 8 in the individual logs match (as indicated bya “1”) or do not match (as indicated by a “0”) and (b) the value onwhich any such source field match occurs. An example of such a matrix isprovided in FIG. 3. It should be noted, of course, that this matching ofvalues in the source fields 8 (i.e., two events happened in the samecity or on the same type of device) is not the same as the ultimatedetermination whether the Pairs 2 themselves are a Match or an Unmatch(i.e. correspond to the same ad-serve event or not). While thematch/unmatch nature of the source field 8 values are preferably denotedby Boolean values 1 and 0, the Match/Unmatch classification of Pairs 2themselves may be denoted by different values, such as M (for Match) andU (for Unmatch) for purposes of describing the invention with clarity.In the case where the concept of the real-world match (or not-match) ofthe information represented by a single field for a given pair must beconsidered separately from the field values reported by the two eventlogs either matching or not (a 1 or a 0), lower case “match” and“unmatch,” or “m” and “u” may be utilized for clarity.

As shown in FIG. 3, the matrix may include a number of columnscorresponding to the number of source fields 8 compared for each of theevents in a source set. As shown in column four, for example, the values“1” or “0” indicate whether the state in which the user was physicallylocated for the impression event was the same or different for each ofthe Pairs 2. As shown, the first Pair (which corresponds with the valuesshown in row 1) has a “State Match column” value of 0, indicating thatthe Ad-Server event and DSP event reported different states (an exampleis shown in FIG. 2, where the top pair the DSP event data shows thestate of California and the Ad Server event data shows the state of NewYork—no match). For purposes of the preferred embodiment, the values inthe various source fields are said to match (“1”) if the following arefound to be true: (b) the states reported by the Ad Server event and theDSP event are the same, (c) the metro reported by the Ad Server eventand the DSP event are the same, (d) the city reported by the Ad Serverevent and the DSP event are the same, (e) the zip codes, and separatelythe first 4, 3, and 2 digits of the zip codes, reported by the Ad Serverevent and the DSP event are the same, (f) the operating systems reportedby the Ad Server event and the DSP event are the same, (g) the browsersreported by the Ad Server event and the DSP event are the same, and (h)the sites reported by the Ad Server event and the DSP event are thesame. From each such row of Pair attributes, the probability that twoevents forming the Pair correspond to the same real-world event of auser being delivered an impression (i.e. whether the Pair 2 is a Matchor Unmatch) can be determined.

The Bayesian analysis of the present invention utilizes the principlesof Bayes' Theorem, which provides the following:

${{P\left( {H❘E} \right)} = \frac{{P\left( {E❘H} \right)} \times {P(H)}}{P(E)}},{{{for}\mspace{14mu} H} = {{{hypothesis}\mspace{14mu}{and}\mspace{14mu} E} = {evidence}}}$For the present invention, the hypothesis for each Pair is the state“Match,” represented with a capital “M” (and where necessary “Unmatch”is represented with a capital “U”). The evidence (E) that is used toinform about the truth or falsity of the hypothesis (that a selectedPair is a Match) consists of the row of Pair Attribute valuescorresponding to that Pair, as portrayed in the example table shown inFIG. 3. For each Pair the question is “what is the probability that thetwo log events in this pair represent the same real-world event, giventhe values that make up the pair attributes associated with this pair?,”or, put in terms of Bayes' Theorem, “what is P(H|E)?” For each Pair,then, the above equation is considered in the following terms:

$\begin{matrix}{{{P\left( {M❘E} \right)} = {{1 - {P\left( {U❘E} \right)}} = {1 - {\frac{{P\left( {E❘U} \right)} \times {P(U)}}{P(E)}\mspace{14mu}{for}}}}}{M = {{Match}\mspace{14mu}{and}}}\text{}{{{for}\mspace{14mu} E} = {{set}\mspace{14mu}{of}\mspace{14mu}{Comparison}\mspace{14mu}{field}\mspace{14mu}{values}\mspace{14mu}\left\{ e_{i} \right\}}}} & \left( {{Equation}\mspace{14mu} 1} \right)\end{matrix}$This equation may be referred to as Equation 1. Every Pair is either aMatch or an Unmatch, from which we know that P(M|E)=1−P(U|E). The threeterms appearing on the right side of Equation 1 are discussed more fullybelow.

First the term P(E|U) can be discussed in detail. While the presentinvention does consider whether the Pairs are a Match given the entirerow of values (E={e₁, e₂, . . . , e_(n)}) for all Comparison Fields, theComparison Fields must first be analyzed individually (the individuale_(i) values). Assuming that the e_(i) are independent of one another,then this relationship can be expressed by the probability correspondingto the entire row P(M|E) equaling the product of the individualprobabilities P(M|e_(i)) for each Comparison Field value in that row:P(E|U)=Π_(i=1) ^(e) ^(i) ^(∈E) P(e _(i) |U)  (Equation 2)assuming the e_(i) are all pairwise independent. There is one largeexception to this independence condition: because the four geographicComparison Fields (State, Metro, City, Zip) are not independent of oneanother, they must be combined into a single aggregate Comparison Fieldthat both (a) is independent of the other e_(i) in Equation 2, and (b)preserves the information contained in the four geographic e_(i). Thissingle aggregate geographic Comparison Field (which is referred to ase_(G)) is then included among the e_(i) in place of the four previousgeographic fields, and along with the non-geographic Comparison Fields,in the product of Equation 2:P(E|U)=P(e _(G) |U)×Π_(i=1) ^(e) ^(i) ^(∈E,∉G) P(e _(i) |U)  (Equation3)

-   -   for G=set of 4 geographic Comparison Fields and    -   for e_(G)=vector of 4 geographic fields Region, Metro, City, Zip        This equation is referred to as Equation 3, which is merely a        special case of Equation 2, where the probability corresponding        to e_(G) is written out separately. This could be equivalently        expressed accurately in the form of Equation 2, inserting e_(G)        as one of the e_(i).

P(e_(G)|U) can be defined in terms of its constituent fields, as shownin Equation 4 below:P(e _(G) |U)=P(e _(R) |U)×P(e _(M) |e _(R) ∩U)×P(e _(C) |e _(M) ∩e _(R)∩U)×P(e _(Z) |e _(C) ∩e _(M) ∩e _(R) ∩U)  (Equation 4)Definitions for the expansion of Equation 4 terms are shown below:

-   -   e_(G)=[e_(R), e_(M), e_(C), e_(Z)]    -   e_(R)=1(DCM Region=TTD Region), a binary indicator of the Region        fields in a Pair matching    -   E_(M)=1(DCM Metro=TTD Metro)    -   e_(C)=1(DCM City=TTD City)    -   e_(Z)=1(DCM Zip=TTD Zip)    -   R=region value in a single Source Event    -   M=metro value in a single Source Event    -   C=city value in a single Source Event    -   Z=zip value in a single Source Event    -   RM=vector of region and metro values in a single Source Event,    -   such as [California, San Francisco Bay Area]    -   RMC=vector of region, metro and city values in a single Source        Event,    -   such as [California, San Francisco Bay Area, Emeryville]    -   R2=vector containing the R for both Source Events region in a        Pair,    -   such as [California, Texas]    -   RM2=vector containing the RM for both Source Events in a Pair,    -   such as [California, San Francisco Bay Area, Texas, Dallas Ft.        Worth]    -   RMC2=vector containing the RMC for both Source Events in a Pair,        -   such as [California, San Francisco Bay Area, Emeryville,            Tex., Dallas Ft. Worth, Arlington]    -   i_(k)=where i indicates a pairwise vector such as RR, the value        from the k^(th) Set's Event comprising i,    -   such as the DCM Source Event's R value in RR

Expansion of Equation 4 terms is discussed below. First, P(e_(R)|U) canbe defined for two possible cases:

$\begin{matrix}{{{P\left( {e_{R} = {1❘U}} \right)} = {\sum\limits_{{R:i} = 1}^{n}{\prod\limits_{{{Set}:j} = 1}^{2}\frac{{\#\;{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} j\mspace{14mu}{with}\mspace{14mu} R} = i}{{\#\;{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} j}\mspace{11mu}}}}}\mspace{20mu}{{for}\mspace{14mu} n\mspace{14mu}{unique}\mspace{14mu}{values}\mspace{14mu}{of}\mspace{14mu}{R.}}} & {{Case}\mspace{14mu}{A1}} \\{\mspace{79mu}{{P\left( {e_{R} = {0❘U}} \right)} = {1 - {P\left( {e_{R} = {1❘U}} \right)}}}} & {{Case}\mspace{14mu}{A2}}\end{matrix}$And P(e_(M)|e_(R)∩U) can be defined for 4 possible cases:

$\begin{matrix}{{{P\left( {e_{M} = {{1❘e_{R}} = {1\bigcap U}}} \right)} = {\sum\limits_{{R:i} = 1}^{n}\;\left\lbrack {\frac{{{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 1},{R = i}}\mspace{11mu}}{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 1} \times {\sum\limits_{{M:j} = 1}^{m}\;{\prod\limits_{{{Set}:k} = 1}^{2}\frac{{{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu} R} = i},{M = j}}\mspace{14mu}}{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu} R} = i}}}} \right\rbrack}}\mspace{20mu}{{{for}\mspace{14mu} n\mspace{14mu}{unique}\mspace{14mu}{values}\mspace{14mu}{of}\mspace{14mu} R};{m\mspace{14mu}{unique}\mspace{14mu}{values}\mspace{14mu}{of}\mspace{14mu}{M.}}}} & {{Case}\mspace{14mu}{B1}} \\{\mspace{79mu}{{P\left( {e_{M} = {{0❘e_{R}} = {1\bigcap U}}} \right)} = {1 - {P\left( {e_{M} = {{1❘e_{R}} = {1\bigcap U}}} \right)}}}} & {{Case}\mspace{14mu}{B2}} \\{{{P\left( {e_{M} = {{1❘e_{R}} = {0\bigcap U}}} \right)} = {\sum\limits_{{R:i} = 1}^{n}\left\lbrack {\frac{{{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 0},{{RR} = i}}\mspace{11mu}}{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 0} \times {\sum\limits_{{M:j} = 1}^{m}\;{\prod\limits_{{{Set}:k} = 1}^{2}\frac{{{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu} R} = i_{k}},{M = j}}\mspace{14mu}}{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu} R} = i_{k}}}}} \right\rbrack}}\mspace{20mu}{{{{for}\mspace{14mu} n\mspace{14mu}{unique}\mspace{14mu}{values}\mspace{14mu}{of}\mspace{14mu}{RR}\mspace{14mu}{in}\mspace{14mu}{pairs}\mspace{14mu}{with}\mspace{14mu} e_{R}} = 0};}\mspace{20mu}{m\mspace{14mu}{unique}\mspace{14mu}{values}\mspace{14mu}{of}\mspace{14mu}{M.}}} & {{Case}\mspace{14mu}{B3}} \\{\mspace{79mu}{{P\left( {e_{M} = {{0❘e_{R}} = {0\bigcap U}}} \right)} = {1 - {P\left( {e_{M} = {{1❘e_{R}} = {0\bigcap U}}} \right)}}}} & {{Case}\mspace{14mu}{B4}}\end{matrix}$And P(e_(C)|e_(R)∩e_(M)∩U) can be defined for 8 possible cases:

$\begin{matrix}{{{P\left( {e_{C} = {{1❘e_{R}} = {{1\bigcap e_{M}} = {1\bigcap U}}}} \right)} = {\sum\limits_{{R\;{M:i}} = 1}^{n}\left\lbrack {\frac{\begin{matrix}{{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 1},} \\{{e_{M} = 1},{{RM} = i}}\end{matrix}}{\begin{matrix}{{{{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 1},} \\{e_{M} = 1}\end{matrix}} \times {\sum\limits_{j = 1}^{m}\;{\prod\limits_{k = 1}^{2}\frac{{{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RM}} = i},{C = j}}\mspace{14mu}}{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RM}} = i}}}} \right\rbrack}}\mspace{20mu}{{{{for}\mspace{14mu} n\mspace{14mu}{unique}\mspace{14mu}{values}\mspace{14mu}{of}\mspace{14mu}{RM}\mspace{14mu}{in}\mspace{14mu}{Pairs}\mspace{14mu}{with}\mspace{14mu} e_{R}} = 1},\mspace{20mu}{{e_{M} = 1};{m\mspace{14mu}{unique}\mspace{14mu}{values}\mspace{14mu}{of}\mspace{14mu}{C.}}}}} & {{Case}\mspace{14mu}{C1}} \\{\mspace{11mu}{{{P\left( {e_{C} = {{0❘e_{R}} = {{1\bigcap e_{M}} = {1\bigcap U}}}} \right)} = {1 - {P\left( {e_{C} = {{1❘e_{R}} = {{1\bigcap e_{M}} = {1\bigcap U}}}} \right)}}}\mspace{20mu}{1 - {P\left( {e_{C} = {{1❘e_{R}} = {{1\bigcap e_{M}} = {1\bigcap U}}}} \right)}}}} & {{Case}\mspace{14mu}{C2}} \\{{P\left( {e_{C} = {{1❘e_{R}} = {{1\bigcap e_{M}} = {0\bigcap U}}}} \right)} = {\sum\limits_{{{R\; M\; 2}:i} = 1}^{n}\left\lbrack {\frac{\begin{matrix}{{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 1},} \\{{e_{M} = 0},{{RMRM} = i}}\end{matrix}}{\begin{matrix}{{{{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 1},} \\{e_{M} = 0}\end{matrix}} \times {\sum\limits_{{C:j} = 1}^{m}\;{\prod\limits_{{{Set}:k} = 1}^{2}\frac{{{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RMRM}} = i},{C = j}}\mspace{14mu}}{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RMRM}} = i}}}} \right\rbrack}} & {{Case}\mspace{14mu}{C3}} \\{{P\left( {e_{C} = {{0❘e_{R}} = {{1\bigcap e_{M}} = {0\bigcap U}}}} \right)} = {1 - {P\left( {e_{C} = {{1❘e_{R}} = {{1\bigcap e_{M}} = {0\bigcap U}}}} \right)}}} & {{Case}\mspace{14mu}{C4}} \\{{P\left( {e_{C} = {{1❘e_{R}} = {{0\bigcap e_{M}} = {1\bigcap U}}}} \right)} = {\sum\limits_{{{R\; M\; 2}:i} = 1}^{n}\left\lbrack {\frac{\begin{matrix}{{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 0},} \\{{e_{M} = 1},{{RMRM} = i}}\end{matrix}}{\begin{matrix}{{{{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 0},} \\{e_{M} = 1}\end{matrix}} \times {\sum\limits_{{C:j} = 1}^{m}\;{\prod\limits_{{{Set}:k} = 1}^{2}\frac{{{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RMRM}} = i},{C = j}}\mspace{14mu}}{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RMRM}} = i}}}} \right\rbrack}} & {{Case}\mspace{14mu}{C5}} \\{{P\left( {e_{C} = {{0❘e_{R}} = {{0\bigcap e_{M}} = {1\bigcap U}}}} \right)} = {1 - {P\left( {e_{C} = {{1❘e_{R}} = {{0\bigcap e_{M}} = {1\bigcap U}}}} \right)}}} & {{Case}\mspace{14mu}{C6}} \\{{P\left( {e_{C} = {{1❘e_{R}} = {{0\bigcap e_{M}} = {0\bigcap U}}}} \right)} = {\sum\limits_{{{R\; M\; 2}:i} = 1}^{n}\left\lbrack {\frac{\begin{matrix}{{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 0},} \\{{e_{M} = 0},{{RMRM} = i}}\end{matrix}}{\begin{matrix}{{{{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 0},} \\{e_{M} = 0}\end{matrix}} \times {\sum\limits_{{C:j} = 1}^{m}\;{\prod\limits_{{{Set}:k} = 1}^{2}\frac{{{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RMRM}} = i},{C = j}}\mspace{14mu}}{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RMRM}} = i}}}} \right\rbrack}} & {{Case}\mspace{14mu}{C7}} \\{{P\left( {e_{C} = {{0❘e_{R}} = {{0\bigcap e_{M}} = {0\bigcap U}}}} \right)} = {1 - {P\left( {e_{C} = {{1❘e_{R}} = {{0\bigcap e_{M}} = {0\bigcap U}}}} \right)}}} & {{Case}\mspace{14mu}{C8}}\end{matrix}$

In addition, P(e_(Z)|e_(R)∩e_(M)∩e_(C)∩U) defined for 16 possible cases:

$\begin{matrix}{{P\left( {e_{Z} = {{1❘e_{R}} = {{1\bigcap e_{M}} = {{1\bigcap e_{C}} = {1\bigcap U}}}}} \right)} = {\sum\limits_{{R\; M\;{C:i}} = 1}^{n}\left\lbrack {\frac{\begin{matrix}{{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 1},} \\{{e_{M} = 1},{e_{C} = 1},{{RMC} = i}}\end{matrix}}{\begin{matrix}{{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 1},} \\{{e_{M} = 1},{e_{C} = 1}}\end{matrix}} \times {\sum\limits_{{Z:j} = 1}^{m}\;{\prod\limits_{{{Set}:k} = 1}^{2}\frac{{{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RMC}} = i},{Z = j}}\;}{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RMC}} = i}}}} \right\rbrack}} & {{Case}\mspace{14mu}{D1}} \\{{P\left( {e_{Z} = {{0❘e_{R}} = {{1\bigcap e_{M}} = {{1\bigcap e_{C}} = {1\bigcap U}}}}} \right)} = {1 - {P\left( {e_{Z} = {{1❘e_{R}} = {{1\bigcap e_{M}} = {{1\bigcap e_{C}} = {1\bigcap U}}}}} \right)}}} & {{Case}\mspace{14mu}{D2}} \\{{P\left( {e_{Z} = {{1❘e_{R}} = {{0\bigcap e_{M}} = {{1\bigcap e_{C}} = {1\bigcap U}}}}} \right)} = {\sum\limits_{{{R\; M\; C\; 2}:i} = 1}^{n}\left\lbrack {\frac{\begin{matrix}{{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 0},} \\{{e_{M} = 1},{e_{C} = 1},{{{RMC}\; 2} = i}}\end{matrix}}{\begin{matrix}{{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 0},} \\{{e_{M} = 1},{e_{C} = 1}}\end{matrix}} \times {\sum\limits_{{Z:j} = 1}^{m}\;{\prod\limits_{{{Set}:k} = 1}^{2}\frac{{{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RMC}} = i_{k}},{Z = j}}\;}{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RMC}} = i_{k}}}}} \right\rbrack}} & {{Case}\mspace{14mu}{D3}} \\{{P\left( {e_{Z} = {{0❘e_{R}} = {{0\bigcap e_{M}} = {{1\bigcap e_{C}} = {1\bigcap U}}}}} \right)} = {1 - {P\left( {e_{Z} = {{1❘e_{R}} = {{0\bigcap e_{M}} = {{1\bigcap e_{C}} = {1\bigcap U}}}}} \right)}}} & {{Case}\mspace{14mu}{D4}} \\{{P\left( {e_{Z} = {{1❘e_{R}} = {{1\bigcap e_{M}} = {{0\bigcap e_{C}} = {1\bigcap U}}}}} \right)} = {\sum\limits_{{{R\; M\; C\; 2}:i} = 1}^{n}\left\lbrack {\frac{\begin{matrix}{{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 1},} \\{{e_{M} = 0},{e_{C} = 1},{{{RMC}\; 2} = i}}\end{matrix}}{\begin{matrix}{{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 1},} \\{{e_{M} = 0},{e_{C} = 1}}\end{matrix}} \times {\sum\limits_{{Z:j} = 1}^{m}\;{\prod\limits_{{{Set}:k} = 1}^{2}\frac{{{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RMC}} = i_{k}},{Z = j}}\;}{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RMC}} = i_{k}}}}} \right\rbrack}} & {{Case}\mspace{14mu}{D5}} \\{{P\left( {e_{Z} = {{0❘e_{R}} = {{1\bigcap e_{M}} = {{0\bigcap e_{C}} = {1\bigcap U}}}}} \right)} = {1 - {P\left( {e_{Z} = {{1❘e_{R}} = {{1\bigcap e_{M}} = {{0\bigcap e_{C}} = {1\bigcap U}}}}} \right)}}} & {{Case}\mspace{14mu}{D6}} \\{{P\left( {e_{Z} = {{1❘e_{R}} = {{1\bigcap e_{M}} = {{1\bigcap e_{C}} = {0\bigcap U}}}}} \right)} = {\sum\limits_{{{R\; M\; C\; 2}:i} = 1}^{n}\left\lbrack {\frac{\begin{matrix}{{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 1},} \\{{e_{M} = 1},{e_{C} = 0},{{{RMC}\; 2} = i}}\end{matrix}}{\begin{matrix}{{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 1},} \\{{e_{M} = 1},{e_{C} = 0}}\end{matrix}} \times {\sum\limits_{{Z:j} = 1}^{m}\;{\prod\limits_{{{Set}:k} = 1}^{2}\frac{{{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RMC}} = i_{k}},{Z = j}}\;}{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RMC}} = i_{k}}}}} \right\rbrack}} & {{Case}\mspace{14mu}{D7}} \\{{P\left( {e_{Z} = {{0❘e_{R}} = {{1\bigcap e_{M}} = {{1\bigcap e_{C}} = {0\bigcap U}}}}} \right)} = {1 - {P\left( {e_{Z} = {{1❘e_{R}} = {{1\bigcap e_{M}} = {{1\bigcap e_{C}} = {0\bigcap U}}}}} \right)}}} & {{Case}\mspace{14mu}{D8}} \\{{P\left( {e_{Z} = {{1❘e_{R}} = {{1\bigcap e_{M}} = {{0\bigcap e_{C}} = {0\bigcap U}}}}} \right)} = {\sum\limits_{{{R\; M\; C\; 2}:i} = 1}^{n}\left\lbrack {\frac{\begin{matrix}{{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 1},} \\{{e_{M} = 0},{e_{C} = 0},{{{RMC}\; 2} = i}}\end{matrix}}{\begin{matrix}{{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 1},} \\{{e_{M} = 0},{e_{C} = 0}}\end{matrix}} \times {\sum\limits_{{Z:j} = 1}^{m}\;{\prod\limits_{{{Set}:k} = 1}^{2}\frac{{{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RMC}} = i_{k}},{Z = j}}\;}{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RMC}} = i_{k}}}}} \right\rbrack}} & {{Case}\mspace{14mu}{D9}} \\{{P\left( {e_{Z} = {{0❘e_{R}} = {{1\bigcap e_{M}} = {{0\bigcap e_{C}} = {1\bigcap U}}}}} \right)} = {1 - {P\left( {e_{Z} = {{1❘e_{R}} = {{1\bigcap e_{M}} = {{\bigcap e_{C}} = {1\bigcap U}}}}} \right)}}} & {{Case}\mspace{14mu}{D10}} \\{{P\left( {e_{Z} = {{1❘e_{R}} = {{0\bigcap e_{M}} = {{1\bigcap e_{C}} = {0\bigcap U}}}}} \right)} = {\sum\limits_{{{R\; M\; C\; 2}:i} = 1}^{n}\left\lbrack {\frac{\begin{matrix}{{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 0},} \\{{e_{M} = 1},{e_{C} = 0},{{{RMC}\; 2} = i}}\end{matrix}}{\begin{matrix}{{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 0},} \\{{e_{M} = 1},{e_{C} = 0}}\end{matrix}} \times {\sum\limits_{{Z:j} = 1}^{m}\;{\prod\limits_{{{Set}:k} = 1}^{2}\frac{{{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RMC}} = i_{k}},{Z = j}}\;}{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RMC}} = i_{k}}}}} \right\rbrack}} & {{Case}\mspace{14mu}{D11}} \\{{P\left( {e_{Z} = {{0❘e_{R}} = {{0\bigcap e_{M}} = {{1\bigcap e_{C}} = {0\bigcap U}}}}} \right)} = {1 - {P\left( {e_{Z} = {{1❘e_{R}} = {{0\bigcap e_{M}} = {{1\bigcap e_{C}} = {0\bigcap U}}}}} \right)}}} & {{Case}\mspace{14mu}{D12}} \\{{P\left( {e_{Z} = {{1❘e_{R}} = {{0\bigcap e_{M}} = {{0\bigcap e_{C}} = {1\bigcap U}}}}} \right)} = {\sum\limits_{{{R\; M\; C\; 2}:i} = 1}^{n}\left\lbrack {\frac{\begin{matrix}{{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 0},} \\{{e_{M} = 0},{e_{C} = 1},{{{RMC}\; 2} = i}}\end{matrix}}{\begin{matrix}{{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 0},} \\{{e_{M} = 0},{e_{C} = 1}}\end{matrix}} \times {\sum\limits_{{Z:j} = 1}^{m}\;{\prod\limits_{{{Set}:k} = 1}^{2}\frac{{{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RMC}} = i_{k}},{Z = j}}\;}{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RMC}} = i_{k}}}}} \right\rbrack}} & {{Case}\mspace{14mu}{D13}} \\{{P\left( {e_{Z} = {{0❘e_{R}} = {{0\bigcap e_{M}} = {{0\bigcap e_{C}} = {1\bigcap U}}}}} \right)} = {1 - {P\left( {e_{Z} = {{1❘e_{R}} = {{0\bigcap e_{M}} = {{0\bigcap e_{C}} = {1\bigcap U}}}}} \right)}}} & {{Case}\mspace{14mu}{D14}} \\{{P\left( {e_{Z} = {{1❘e_{R}} = {{0\bigcap e_{M}} = {{0\bigcap e_{C}} = {0\bigcap U}}}}} \right)} = {\sum\limits_{{{R\; M\; C\; 2}:i} = 1}^{n}\left\lbrack {\frac{\begin{matrix}{{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 0},} \\{{e_{M} = 0},{e_{C} = 0},{{{RMC}\; 2} = i}}\end{matrix}}{\begin{matrix}{{{\#\mspace{11mu}{Pairs}\mspace{14mu}{with}\mspace{20mu} e_{R}} = 0},} \\{{e_{M} = 0},{e_{C} = 0}}\end{matrix}} \times {\sum\limits_{{Z:j} = 1}^{m}\;{\prod\limits_{{{Set}:k} = 1}^{2}\frac{{{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RMC}} = i_{k}},{Z = j}}\;}{{\#\mspace{11mu}{Events}\mspace{14mu}{in}\mspace{14mu}{Set}\mspace{14mu} k\mspace{14mu}{with}\mspace{14mu}{RMC}} = i_{k}}}}} \right\rbrack}} & {{Case}\mspace{14mu}{D15}} \\{{P\left( {e_{Z} = {{0❘e_{R}} = {{0\bigcap e_{M}} = {{0\bigcap e_{C}} = {0\bigcap U}}}}} \right)} = {1 - {P\left( {e_{Z} = {{1❘e_{R}} = {{0\bigcap e_{M}} = {{0\bigcap e_{C}} = {0\bigcap U}}}}} \right)}}} & {{Case}\mspace{14mu}{D16}}\end{matrix}$

P(E_(G)|U) can be calculated for each of 16 possible E_(G) vectors byfirst creating a 16 row×10 column data structure PEgDf, as shown, forexample below:

EgIndex er em ec ez PrU PmrU PcrmU PzrmcU PEgU 1 1 1 1 1 A1 B1 C1 D1 2 11 1 0 A1 B1 C1 D2 3 1 1 0 1 A1 B1 C2 D3 4 1 1 0 0 A1 B1 C2 D4 5 1 0 1 1A1 B2 C3 D5 6 1 0 1 0 A1 B2 C3 D6 7 1 0 0 1 A1 B2 C4 D7 8 1 0 0 0 A1 B2C4 D8 9 0 1 1 1 A2 B3 C5 D9 10 0 1 1 0 A2 B3 C5 D10 11 0 1 0 1 A2 B3 C6D11 12 0 1 0 0 A2 B3 C6 D12 13 0 0 1 1 A2 B4 C7 D13 14 0 0 1 0 A2 B4 C7D14 15 0 0 0 1 A2 B4 C8 D15 16 0 0 0 0 A2 B4 C8 D16The first five columns (EgIndex, er, em, ec, and ez) simply list out andindex the sixteen possible combinations of the four binary variablesthat make up e_(G). The formulae for calculating the values that go incolumns six through nine (PrU, PmrU, PcrmU, and PzrmcU) are providedabove. The values provided in the table reference the appropriateequation from above that is used. For example, A1 refers to the equationprovided in Case A1 above, while B4 refers to the equation provided inCase B4 discussed above. The last column equals the product of thevalues in columns six through nine. For the above calculations,P(e_(i)=1|U) is defined for each Pair Field i as shown in Equation 5:

$\begin{matrix}{{{P\left( {e_{i} = {1❘U}} \right)} = {\sum\limits_{k = 1}^{n}\;{{frequency}\mspace{14mu}{of}\mspace{14mu}{value}\mspace{14mu} k\mspace{14mu}{in}\mspace{14mu}{source}\mspace{14mu}{set}\mspace{14mu} 1}}},\mspace{20mu}{{field}\mspace{14mu} i \times \mspace{14mu}{frequency}\mspace{14mu}{of}\mspace{14mu}{value}\mspace{14mu} k\mspace{14mu}{in}\mspace{14mu}{source}\mspace{14mu}{set}\mspace{14mu} 2},{{field}\mspace{14mu} i\mspace{14mu}{over}\mspace{14mu}{the}\mspace{14mu} n\mspace{14mu}{unique}\mspace{14mu}{values}\mspace{14mu}{appearing}\mspace{14mu}{in}\mspace{14mu}{field}\mspace{14mu} i\mspace{14mu}{over}\mspace{14mu}{both}\mspace{14mu}{source}\mspace{14mu}{{sets}.}}} & \left( {{Equation}\mspace{14mu} 5} \right)\end{matrix}$

P(U) is the probability of the Pair in question being an Unmatch, withno additional information or condition, with a frequentist approach.Over a large data set, this will equate to the number of real-worldMatch Pairs divided by the total number of Pairs considered. The numberof un-matches in every unit group should be known because it is knownthat each source event belongs to exactly one Match Pair. Therefore:

${P(U)} = {{1 - \frac{{{number}\mspace{14mu}{of}\mspace{14mu}{Matches}\mspace{14mu}{in}\mspace{14mu}{unit}\mspace{14mu}{group}}\mspace{11mu}}{{{number}\mspace{14mu}{of}\mspace{14mu}{Pairs}\mspace{14mu}{in}\mspace{14mu}{group}}\;}} = {\frac{{{number}\mspace{14mu}{of}\mspace{14mu}{events}\mspace{14mu}{in}\mspace{14mu}{smaller}\mspace{14mu}{source}\mspace{14mu}{set}}\mspace{11mu}}{\begin{matrix}{{number}\mspace{14mu}{of}\mspace{14mu}{events}\mspace{14mu}{in}\mspace{14mu}{smaller}\mspace{14mu}{source}\mspace{14mu}{set} \times} \\{{number}\mspace{14mu}{of}\mspace{14mu}{events}\mspace{14mu}{in}\mspace{14mu}{{larg}{er}}\mspace{14mu}{source}\mspace{14mu}{set}}\end{matrix}} = \frac{1}{{number}\mspace{14mu}{of}\mspace{14mu}{events}\mspace{14mu}{in}\mspace{14mu}{{larg}{er}}\mspace{14mu}{source}\mspace{14mu}{set}}}}$

The above Bayesian inference allows for the calculation of a probabilityof Match for an individual Pair based solely on the row of Pair Fieldscorresponding to the two events that make up that Pair. The probabilityvalue it produces (P1) is the best estimate based solely on thatinformation. It is understood, however, that for each event from thesmaller Source Set, precisely one Pair including that event will be aMatch in the real-world sense (the set of all such Pairs including thesame event may be referred to as a “Pair Cohort”). Therefore:

${\sum\limits_{j = 1}^{n}\;{P\left( {{pair}_{i,j}\mspace{14mu}{is}\mspace{14mu}{Match}} \right)}} = 1$${for}\left\{ \begin{matrix}{{pair}_{i,j} = {{the}\mspace{14mu}{pair}\mspace{14mu}{of}\mspace{14mu}{Event}\mspace{14mu} i\mspace{14mu}{from}\mspace{14mu}{small}\mspace{14mu}{Source}\mspace{14mu}{Set}\mspace{14mu}{and}\mspace{14mu}{Event}\mspace{14mu} j\mspace{14mu}{from}\mspace{14mu}{large}\mspace{14mu}{Source}\mspace{14mu}{Set}}} \\{i = {{indicator}\mspace{14mu}{of}\mspace{14mu}{single}\mspace{14mu}{event}\mspace{14mu}{from}\mspace{14mu}{smalller}\mspace{14mu}{Source}\mspace{14mu}{Set}}} \\{n = {{number}\mspace{14mu}{of}\mspace{14mu}{Events}\mspace{14mu}{in}\mspace{14mu}{larger}\mspace{14mu}{Source}\mspace{14mu}{Set}}}\end{matrix} \right.$This property is reflected in part in the prior probability P(H)=P(M)=P3in the Bayesian analysis. The Bayesian analysis does not reflect,however, that precisely one of the Pairs corresponding to each Event isa Match and all of the others are not. The single row for each Eventdemonstrating the highest P1 value can be selected and designated as thesingle Match among all rows corresponding to that Event. More precisely,the probability measure can be calculated with the following equationP1*=P(M|P1 and precisely one pair is M for all Pairs incorporating asingle Event)This value equals the following, referred to as Equation 6:

$\frac{P\; 1_{i} \times {\prod\limits_{j \neq i}\left( {1 - {P\; 1_{j}}} \right)}}{\begin{matrix}{{P\; 1_{i} \times {\prod\limits_{j \neq i}\left( {1 - {P\; 1_{j}}} \right)}} +} \\{\sum\limits_{j \neq i}\left( {P\; 1_{j} \times {\prod\limits_{k \neq j}\left. 〚\left( {1 - {P\; 1_{k}}} \right) \right)}}〛 \right.}\end{matrix}} = {{P\; 1_{i} \times \frac{\prod\limits_{j \neq i}\left( {1 - {P\; 1_{j}}} \right)}{\begin{matrix}{{P\; 1_{i} \times {\prod\limits_{j \neq i}\left( {1 - {P\; 1_{j}}} \right)}} +} \\{\sum\limits_{j \neq i}\left( {P\; 1_{j} \times {\prod\limits_{k \neq j}\left. 〚\left( {1 - {P\; 1_{k}}} \right) \right)}}〛 \right.}\end{matrix}}} = {P\;{1_{i}^{*}.}}}$where P1_(i) is the P1 value of the Pair in question, j≠i denotes allPairs in the same Pair Cohort as Pair i other than Pair i, and k≠jdenotes all Pairs in the same Pair Cohort as Pair i other than Pair jand including Pair i.

This is the probability of the state that the given Pair is a Match, byP1, while all other Pairs including the given event from the smallersource set are Unmatch, also by P1, as a proportion of the state spacethat is the sum of all states where exactly one Pair from the PairCohort is a Match. Within each Pair Cohort, P1* values will migratetoward 1 and 0 from P1 values, which will make Match pairs stand out(they will migrate toward 1 while all other pairs migrate toward 0), andwill make determining when errors in the data exclude a Match for agiven event (because P1* for the “strongest” candidate pair in a PairCohort will not migrate toward 1 as strongly as expected).

The quantity

$\frac{\prod\limits_{j \neq i}\left( {1 - {P\; 1_{j}}} \right)}{{P\; 1_{i} \times {\prod\limits_{j \neq i}\left( {1 - {P\; 1_{j}}} \right)}} + {\sum\limits_{j \neq i}\left( {P\; 1_{j} \times {\prod\limits_{k \neq j}\left. 〚\left( {1 - {P\; 1_{k}}} \right) \right)}}〛 \right.}}$from Equation 6 may be referred to as the “Sole Rightful Heir Factor”and it can be simplified as:(Π_(j≠i)

(1−P1_j)

)/(Σ_(l=1){circumflex over ( )}n

P1_l×Π_(m≠l)

(1−P1_m))

)where j≠i still denotes all Pairs in Pair i's Pair Cohort other thanPair i, n is the total number of Pairs in the Pair Cohort, l is theindex for all Pairs in such Pair Cohort.

The above calculation gives a probability value for each Pair being aMatch, but it must be determined what level of such probability shouldcause such Pair to be treated as a Match versus an Unmatch. Such levelis called the “Decision Threshold.” The migration of P1* toward extremevalues will enhance the ability to select a reliable Decision Thresholdby creating a wider “street” separating “high” P1* values for each SmallSource event from “low” values. A simple supervised clusteringclassification model like K-means, trained on the P1* values, will thusproduce a robust boundary between such values and allow determination ofa confidence level in the Match and Unmatch determinations.

It may be understood that the present invention as described above maybe implemented in the form of control logic using computer software in amodular or integrated manner. Based on the disclosure and teachingsprovided herein, a person of ordinary skill in the art may know andappreciate other ways and/or methods to implement the present inventionusing hardware, software, or a combination of hardware and software.

The above description is illustrative and is not restrictive. Manyvariations of the invention will become apparent to those skilled in theart upon review of the disclosure. The scope of the invention should,therefore, be determined not with reference to the above description,but instead should be determined with reference to the pending claimsalong with their full scope or equivalents.

One or more features from any embodiment may be combined with one ormore features of any other embodiment without departing from the scopeof the invention. A recitation of “a”, “an” or “the” is intended to mean“one or more” unless specifically indicated to the contrary. Recitationof “and/or” is intended to represent the most inclusive sense of theterm unless specifically indicated to the contrary.

One or more of the elements of the present system may be claimed asmeans for accomplishing a particular function. Where suchmeans-plus-function elements are used to describe certain elements of aclaimed system it will be understood by those of ordinary skill in theart having the present specification, figures and claims before them,that the corresponding structure is a general purpose computer,processor, or microprocessor (as the case may be) programmed to performthe particularly recited function using functionality found in anygeneral purpose computer without special programming and/or byimplementing one or more algorithms to achieve the recitedfunctionality. As would be understood by those of ordinary skill in theart that algorithm may be expressed within this disclosure as amathematical formula, a flow chart, a narrative, and/or in any othermanner that provides sufficient structure for those of ordinary skill inthe art to implement the recited process and its equivalents.

While the present disclosure may be embodied in many different forms,the drawings and discussion are presented with the understanding thatthe present disclosure is an exemplification of the principles of one ormore inventions and is not intended to limit any one of the inventionsto the embodiments illustrated.

Further advantages and modifications of the above described system andmethod will readily occur to those skilled in the art. The disclosure,in its broader aspects, is therefore not limited to the specificdetails, representative system and methods, and illustrative examplesshown and described above. Various modifications and variations can bemade to the above specification without departing from the scope orspirit of the present disclosure, and it is intended that the presentdisclosure covers all such modifications and variations provided theycome within the scope of the following claims and their equivalents.

The invention claimed is:
 1. A computer-implemented method foraccurately matching corresponding demand-side platform DSP event dataand Ad-Server event data associated with a single real-world ad serveevent, the method comprising the steps of: a. at a common server,receiving from the DSP a set of DSP event data from a DSP source set,each piece of DSP event data comprising a series of DSP source fieldshaving a field data value and wherein at least two of the DSP event dataare DSP geographic event data; b. at the common server, receiving froman ad server a set of Ad-Server event data from an Ad-Server source set,each piece of Ad-Server event data comprising a series of Ad-Serversource fields having a field data value and wherein at least two of theAd-Server event data are Ad-Server geographic event data; c. combiningeach piece of DSP geographic event data into a combined DSP comparisonfield, combining each piece of Ad-Server event data into a combinedAd-Server comparison field, and pairing each piece of DSP event datawith each piece of Ad-Server event data to create a data matrix storedat the common server, wherein the data matrix comprises a plurality ofDSP-Ad Server event data pairs; d. for each of the plurality of DSP-AdServer event data pairs in the data matrix: i. comparing the data valueof each DSP source field with the data value of each correspondingAd-Server source field to create a set of pair fields; ii. assigning aPair Attribute to each pair field in the set of pair fields, wherein thePair Attribute for each pair field comprises a first value attribute anda second value attribute, wherein the first value attribute comprises aBoolean attribute indicating one of (a) a field data match of the pairfiled and (b) a field data unmatch of the pair field, and wherein thesecond value attribute comprises one of (a) the data vale of the pairfield if the field data is a match and (b) a null value if the fielddata is an un-match; and iii. using the assigned Pair Attributes in theset of pair fields to determine the probability that the DSP event dataand the Ad-Server event data of the particular DSP-Ad Server event datapair are both associated with a single real-world ad serve event using aBayesian analysis according to$\frac{P1_{i} \times {\prod\limits_{j \neq i}\;\left( {1 - {P1_{j}}} \right)}}{{P1_{i} \times {\prod\limits_{j \neq i}\left( {1 - {P1_{j}}} \right)}} + {\sum\limits_{j \neq i}\left( {P\; 1_{j} \times {\prod\limits_{k \neq j}\left. 〚\left( {1 - {P1_{k}}} \right) \right)}}〛 \right.}} = {{P1_{i} \times \frac{\prod\limits_{j \neq i}\left( {1 - {P1_{j}}} \right)}{{P1_{i} \times {\prod\limits_{j \neq i}\left( {1 - {P1_{j}}} \right)}} + {\sum\limits_{j \neq i}\left( {P\; 1_{j} \times {\prod\limits_{k \neq j}\;\left. 〚\left( {1 - {P1_{k}}} \right) \right)}}〛 \right.}}} = {P1_{i}^{*}}}$where P1_(i) is the probability value of each pair field, j ≠ i denotesall pairs in a same pair cohort as pair i other than pair i, and k ≠ jdenotes all pairs in the same pair cohort as pair i other than pair jand including pair i; e. determining a decision threshold by aclustering classification model; and f. applying the decision thresholdto the probability for each pair field in the set of pair fields todetermine whether a pair field represents a match.
 2. The method ofclaim 1, wherein the DSP source fields comprise at least one of (a) atimestamp, (b) location information, (c) operating system information,and (d) website information.
 3. The method of claim 2, wherein theAd-Server source fields comprise at least one of (a) a timestamp, (b)location information, (c) operating system information, and (d) websiteinformation.
 4. The method of claim 1, further comprising the step ofprior to creating the event data pairs applying a filter to reduce thenumber of event data pairs to be created.
 5. The method of claim 4wherein applying the filter comprises the steps of: a. segregating theDSP event data into a plurality of unit groups; b. segregating theAd-Server event data into the unit groups; c. defining a time-differencewindow; and d. for each of the unit groups, pairing the DSP event datain the unit group and the Ad-Server event data in the unit group andfiltering out all pairs that fall outside of the time-difference window.6. The method of claim 1, wherein the Boolean value of the PairAttribute indicating a field data match is 1 and the Boolean value of aPair Attribute indicating a field data unmatch is
 0. 7. A system formatching data associated with an ad serve event, the method comprisingthe steps of: a demand-side platform (DSP) comprising one or morecomputer readable storage devices configured to store a plurality of DSPexecutable instructions and further comprising one or more DSP hardwarecomputer processors in communication with the one or more DSP computerreadable storage devices and configured to execute the plurality of DSPcomputer executable instructions in order to cause the demand-sideplatform to generate a set of DSP event data, each piece of DSP eventdata comprising a series of DSP source fields having a DSP field datavalue, wherein at least two of the DSP event data are DSP geographicevent data; an ad server comprising one or more ad server computerreadable storage devices configured to store a plurality of ad serverexecutable instructions and further comprising one or more ad serverhardware computer processors in communication with the one or more adserver computer readable storage devices and configured to execute theplurality of ad server computer executable instructions in order tocause the ad server to generate a set of ad server event data, eachpiece of ad server event data comprising a series of ad server sourcefields having an ad server field data value, wherein at least two of thead server event data are ad server geographic event data; a commonserver comprising one or more common server computer readable storagedevices configured to store a plurality of common server executableinstructions and further comprising one or more common server hardwarecomputer processors in communication with the one or more common servercomputer readable storage devices and configured to execute theplurality of common server computer executable instructions in order tocause the common server to receive from the DSP the set of DSP eventdata and to receive from the ad server the set of ad server event data,combine each piece of DSP geographic event data into a combined DSPcomparison field, combine each piece of ad server event data into acombined ad server comparison field, pair each piece of DSP event datawith each piece of ad server event data to create a data matrix whereinthe data matrix comprises a plurality of DSP to ad server event datapairs, store the data matrix at the common server computer readablestorage devices, and, for each of the plurality of DSP-Ad Server eventdata pairs in the data matrix, compare the data value of each DSP sourcefield with the data value of each corresponding ad server source fieldto create a set of pair fields, assign a pair attribute to each pairfield in the set of pair fields, wherein the pair attribute for eachpair field comprises a first value attribute and a second valueattribute, wherein the first value attribute comprises a Booleanattribute indicating one of (a) a field data match of the pair filed and(b) a field data unmatch of the pair field, and wherein the second valueattribute comprises one of (a) the data vale of the pair field if thefield data is a match and (b) a null value if the field data is anun-match, use the assigned Pair Attributes in the set of pair fields todetermine the probability that the DSP event data and the ad serverevent data of the particular DSP to ad server event data pair are bothassociated with a single real-world ad serve event using a Bayesiananalysis according to$\frac{P1_{i} \times {\prod\limits_{j \neq i}\;\left( {1 - {P1_{j}}} \right)}}{{P1_{i} \times {\prod\limits_{j \neq i}\left( {1 - {P1_{j}}} \right)}} + {\sum\limits_{j \neq i}\left( {P\; 1_{j} \times {\prod\limits_{k \neq j}\left. 〚\left( {1 - {P1_{k}}} \right) \right)}}〛 \right.}} = {{P1_{i} \times \frac{\prod\limits_{j \neq i}\left( {1 - {P1_{j}}} \right)}{{P1_{i} \times {\prod\limits_{j \neq i}\left( {1 - {P1_{j}}} \right)}} + {\sum\limits_{j \neq i}\left( {P\; 1_{j} \times {\prod\limits_{k \neq j}\;\left. 〚\left( {1 - {P1_{k}}} \right) \right)}}〛 \right.}}} = {P1_{i}^{*}}}$where P1_(i), is the probability value of each pair field, j ≠ i denotesall pairs in a same pair cohort as pair i other than pair i, and k ≠ jdenotes all pairs in the same pair cohort as pair i other than pair jand including pair i, determine a decision threshold by a clusteringclassification model, and apply the decision threshold to theprobability for each pair field in the set of pair fields to determinewhether a pair field represents a match.
 8. The system of claim 7,wherein the DSP source fields comprise at least one of (a) a timestamp,(b) location information, (c) operating system information, and (d)website information.
 9. The system of claim 8, wherein the ad serversource fields comprise at least one of (a) a timestamp, (b) locationinformation, (c) operating system information, and (d) websiteinformation.
 10. The system of claim 7, wherein the one or more commonserver hardware computer processors are further configured to executethe plurality of common server computer executable instructions in orderto create the event data pairs by applying a filter to reduce the numberof event data pairs to be created.
 11. The system of claim 10, whereinapplying the filter comprises segregating the DSP event data into aplurality of unit groups, segregating the ad server event data into theunit groups, defining a time-difference window, and, for each of theunit groups, pairing the DSP event data in the unit group and the adserver event data in the unit group and filtering out all pairs thatfall outside of the time-difference window.
 12. The system of claim 7,wherein the Boolean value of the pair attribute indicating a field datamatch is 1 and the Boolean value of a pair attribute indicating a fielddata unmatch is 0.